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In a recent paper published online in Molecular Psychiatry, Skafidas 
et al? report a classifier for identifying individuals at risk for autism 
spectrum disorders (ASDs). Their classifier is based on 267 single- 
nucleotide polymorphisms (SNPs) that were selected from the 
results of a pathway analysis using cases from the Autism Genetic 
Resource Exchange (AGRE). 1 Using within-sample cross-validation, 
the authors claim a classification accuracy for ASDs of 85.6%. They 
subsequently applied their classifier to ASD cases from the Simons 
Foundation Autism Research Initiative (SFARI) and controls from 
the Wellcome Trust Birth Cohort (WTBC) and report ASD 
classification accuracy of 71.7%. 

We believe that the claims made by Skafidas et al) are 
inconsistent with current knowledge of the genetics of ASDs, 2 and 



inconsistent with the expected precision of risk predictions for 
complex psychiatric disorders. Further, as classification accuracy 
depends on the size of the discovery sample, the results are also 
inconsistent with the size of the sample they employed (only 123 
controls were included in the discovery set). 

To examine the validity of Skafidas et al.'s claims, we pursued a 
range of analyses to assess the evidence for association between 
ASDs and (1) the individual SNPs named in their paper as most 
predictive, (2) their genetic classifier, to the extent it was described 
and (3) the pathways identified in the report, from which the 
predictive SNPs were selected. For each analysis, where possible, 
we attempted to replicate the analytic approach of Skafidas et aC 
using data from the Psychiatric Genomics Consortium (PGC) 
autism group, which includes ~5400 cases, more than three times 
the number used in the original report. The methodology of these 
analyses is described in detail in Supplementary Information. 

First, we found no evidence for single SNP associations between 
any of the 30 most contributory SNPs listed by Skafidas et al? in 
their Table 2 and ASDs in the PGC (Table 1). In the current PGC 
meta-analysis, the mean P-value for these SNPs was 0.47 with a 
minimum 0.007, and none are notable or survive a 30 SNP 
correction for multiple testing. Further information on these 
associations can be found in Supplementary Information. 



Table 2. Pathway results from the PGC meta-analysis of ASDs 



KEGG pathway name 


FORGE 


INRICH 


MAGENTA 


SS 


ALIGATOR 


Purine metabolism 


0.715 


0.012 


0.140 


0.477 


0.255 


Calcium signaling 


0.907 


0.719 


0.828 


0.782 


0.987 


Chemokine signaling 


0.060 


0.870 


0.614 


0.418 


0.879 


pathway 












Phosphotidylinositol 


0.256 


0.734 


0.317 


0.480 


0.632 


signaling 












Oocyte meiosis 


0.986 


0.522 


0.743 


0.771 


0.301 


Ubiquitin-mediated 


0.658 


0.429 


0.741 


0.451 


0.943 


proteolysis 












Wnt signaling 


0.863 


0.480 


0.626 


0.408 


0.552 


Axon guidance 


0.611 


0.502 


0.289 


0.083 


0.654 


Focal adhesion 


0.837 


0.435 


NA 


0.685 


0.374 


Cell adhesion 


0.278 


0.472 


0.963 


0.054 


0.255 


molecules 












Gap junction 


0.786 


0.768 


0.780 


0.676 


0.926 


LTM 


0.006 


0.011 


0.078 


0.066 


0.014 


Long-term 


0.937 


0.883 


0.961 


0.742 


0.969 


potentiation 












Long-term depression 


0.727 


0.450 


0.643 


0.230 


0.422 


Taste transduction 


0.510 


1.000 


0.900 


0.670 


0.692 


Insulin signaling 


0.455 


0.318 


0.013 


0.693 


0.187 


pathway 












GnRH signaling 


0.357 


0.589 


0.658 


0.575 


0.927 


Melanogenesis 


0.520 


0.496 


0.509 


0.444 


0.660 



Abbreviations: ASD, autism spectrum disorder; GWAS, genome-wide 
association study; LTM, leukocyte transendothelial migration; NA, not 
applicable. Pathway results from the PGC Network and Pathway Analysis 
(PGC-NPA) group as applied to the meta-analysis results from PGC Autism. 
Five different methods are presented: FORGE, INRICH, MAGENTA, Set 
Screen (SS) and ALIGATOR. These methods have been documented 
elsewhere 6-10 and represent some of the leading methods for pathway 
analysis using GWAS data. None of the pathways identified in the Skafidas 
paper survive a multiple-testing correction based on the PGC ASD meta- 
analysis. 



Table 1. Meta-analytic results for the 30 most predictive SNPs in the 
Skafidas classifier 



SNP 


Chr 


BP 


A1 


A2 


In(OR) 


P-value 


rs260808 


11 


103 909166 


A 


C 


-0.024 


0.510 


rs769052 


5 


138 944 433 


T 


C 


-0.042 


0.422 


rs876619 


16 


56 283 534 


A 


C 


0.044 


0.398 


rs905646 


11 


88 353 802 


A 


G 


0.062 


0.167 


rs968122 


12 


70 791 615 


T 


C 


0.001 


0.974 


rs984371 


11 


55 577 698 


T 


C 


0.018 


0.594 


rs1 243679 


14 


21 093 733 


A 


G 


0.027 


0.710 


rs1818106 


11 


103 913 376 


A 


C 


0.009 


0.736 


rs2239118 


12 


2 660 753 


T 


C 


0.054 


0.097 


rs2240228 


19 


15 852 872 


A 


G 


0.083 


0.007 


rs2300497 


14 


90 865 283 


T 


C 


0.034 


0.408 


rs2384061 


2 


25 135 620 


A 


G 


0.052 


0.058 


rs3773540 


3 


55 096 928 


A 


G 


-0.085 


0.273 


rs41 28941 


17 


63 531 331 


A 


G 


-0.123 


0.085 


rs4308342 


4 


71 884 205 


T 


G 


-0.107 


0.142 


rs4648135 


4 


103 536 670 


A 


G 


0.008 


0.894 


rs6483362 


11 


88 412 451 


A 


G 


-0.0335 


0.513 


rs73 13997 


12 


71 265 958 


A 


C 


0.035 


0.450 


rs7562445 


2 


213 192 048 


T 


G 


0.042 


0.279 


rs7842798 


8 


131 890170 


A 


G 


0.033 


0.241 


rs8053370 


16 


56 262 906 


T 


C 


-0.042 


0.415 


rs9288685 


2 


233 987 114 


T 


C 


-0.007 


0.804 


rs10193128 


2 


233 987 722 


T 


C 


-0.015 


0.581 


rsl 0409541 


19 


13433 127 


T 


C 


0.087 


0.048 


rsl 1020772 


12 


70 792 582 


T 


G 


0.001 


0.966 


rsl 1145506 


9 


80 264 584 


T 


C 


-0.117 


0.282 


rsl 231 7962 


12 


70 792 582 


T 


G 


0.001 


0.966 


rsl 2582971 


12 


18 459 387 


T 


C 


-0.001 


0.981 


rsl 7629494 


10 


53 560 898 


T 


C 


-0.060 


0.217 


rs 17643974 


10 


126 792 798 


T 


C 


0.002 


0.964 



Abbreviations: BP, base pair in HG19; Chr, chromosome; OR, odds ratio; SNP, 
single-nucleotide polymorphism. The SNP name, chromosome, base pair, 
reference allele, alternate allele, natural log of the odds ratio and P-value 
are presented from the meta-analysis of autism spectrum disorders from 
the Psychiatric Genomics Consortium. This meta-analytic strategy reflects 
the weighted combination of the contributing cohorts reflective of power 
to detect association. None of the SNPs meet a multiple testing 
significance threshold, let alone the genome-wide association threshold 
of 5x10" 8 . 
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Second, we examined the classification ability of the 30 SNPs 
disclosed in Skafidas et al? (their Table 2) for ASDs in the PGC. We 
wrote to the authors, asking for the complete list of 237 SNPs and 
weights, but they declined to provide the complete list. We 
accordingly built a classifier using the data for 30 SNPs disclosed in 
Skafidas et al.? which the authors identify as the most influential 
(explaining approximately 58% of the total predictive power of the 
classifier). We constructed the classifier using two approaches. We 
initially used the weights provided by Skafidas et al? and 
examined the predictive ability of the 30 SNP classifier in the 
full PGC autism sample. As described in detail in Supplementary 
Information, the classifier did not differ from chance in its ability to 
predict ASDs (AUC = 0.505, P = 0.22). 

We then built the score using the SNP weights estimated from 
the PGC data. We randomly selected a set of 732 trios to build a 
classifier and then tested the predictive ability of the classifier in a 
distinct set of 243 trios (these number mirror those used by 
Skafidas et al?). For all trios, we created case pseudo-control pairs 
to perform model building and validation, but otherwise followed 
the methods proposed in Skafidas et al? (for example, using 0, 1, 3 
scoring against minor allele count). We repeated this procedure 
across 100 random samples of the same size from the PGC autism 
data. Across these replicates, we tested for a difference between 
case and control risk scores using a f-test (mean risk score of 
cases — mean risk score of controls) and found an average 
t-statistic of 0.492 with an average P-value of 0.50 for the 
validation samples. We conclude that the classifier presented by 
Skafidas et a/., 1 at least as constructed using the 30 top SNPs 
named in their report, does not generalize to predict ASDs in other 
samples. This result strongly suggests that the Skafidas et al? 
results cannot be used to predict ASDs. 

We repeated the set of analyses above using a case-control 
design, to mirror the approach employed by Skafidas et al? We 
used 732 cases matched with 732 population controls for 
discovery, and 243 cases matched with 243 population controls 
for validation, much as the authors initially reported. In these 
comparisons, when principal components were included in the 
analysis to control for population ancestry, we observed nearly 
identical results to what we found in the family-based study 
described above (see Supplementary Information). However, 
without controlling for population ancestry, we observed a bias 
in estimates of the AUC for the curve, suggesting that such bias 
may have contributed to the results reported by Skafidas et al, as 
has already been suggested. 3 

Finally, we evaluated the significance of the pathways identified 
by Skafidas et al? (their Table 1), the analysis which provided the 
basis for their SNP selection. We did not observe significant 
evidence for a relationship between any of these pathways and 
ASDs using five different pathway analysis tools in the combined 
PGC ASD sample set (Table 2). This result strongly suggests that 
the pathway analyses do not generalize to external samples and 
therefore cannot be validly used in the development of a classifier. 

To put the results reported in Skafidas et al? into perspective, 
consider the magnitude of effects implied by the results of the 
classifier. From the external validation experiment, the authors 
report an area under the receiver operating characteristic curve 
0.747 (Skafidas et al., Supplementary Figure S2). This result implies 
that their SNP-set explains ~ 1 1% of variation in liability to ASDs 
(assuming a prevalence of 1% and a liability threshold model). 4 
For complex traits, in particular psychiatric disorders, explaining so 
much variation with so few SNPs and such a small discovery 
sample size (732 cases and 123 controls) is unprecedented, and 
inconsistent with results from genome-wide association studies. 
For example, to achieve similar levels of variance explained in 
human height, sample sizes of ~ 180 000 individuals were 
required. 5 

We find no evidence that the implicated SNPs, the classifier or 
the pathways named in Skafidas et al? are associated with ASDs. 

Molecular Psychiatry (2014), 854-861 



We therefore conclude that the classifier, as presented, cannot be 
used in a general way to predict ASDs, and consequently is 
unlikely to have any translational value. 

The differences between the report of Skafidas et al? and our 
analyses are striking. We suspect that our failures to replicate their 
claims originate from several issues with the original analyses and 
data. In particular, the failure to control for potential population 
stratification in Skafidas et al? has likely led to biased estimates of 
allelic effects, as suggested in a recent letter. 3 We detail other 
technical issues in Supplementary Information, which may also 
explain the differences in the results. 

There are a great many challenges to the accurate interpreta- 
tion of genomic data and multiple false-positive associations from 
technical or study design biases have been identified in the 
literature. We conclude that the classifier presented in Skafidas 
et al? will not usefully identify individuals at risk for ASDs in the 
population. Nevertheless, there are increasing numbers of robust 
and replicable finding emerging in psychiatric genetics. These 
findings hold great promise for understanding the biological basis 
of psychiatric disorders and for translation. 
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